{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "3.13. 丢弃法.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "TPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/chongzicbo/Dive-into-Deep-Learning-tf.keras/blob/master/3.13.%20%E4%B8%A2%E5%BC%83%E6%B3%95.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ur9ATKcoMJm7",
        "colab_type": "text"
      },
      "source": [
        "##3.13. 丢弃法\n",
        "除了前一节介绍的权重衰减以外，深度学习模型常常使用丢弃法（dropout）[1] 来应对过拟合问题。丢弃法有一些不同的变体。本节中提到的丢弃法特指倒置丢弃法（inverted dropout）。\n",
        "\n",
        "###3.13.1. 方法\n",
        "回忆一下，“多层感知机”一节的图3.3描述了一个单隐藏层的多层感知机。其中输入个数为4，隐藏单元个数为5，且隐藏单元 $h_i$\n",
        "$i=1, \\ldots, 5$的计算表达式为$$\n",
        "h_i = \\phi\\left(x_1 w_{1i} + x_2 w_{2i} + x_3 w_{3i} + x_4 w_{4i} + b_i\\right),\n",
        "$$\n",
        "这里$\\phi$是激活函数，$x_1, \\ldots, x_4$是输入，隐藏单元$i$的权重参数为$w_{1i}, \\ldots, w_{4i}$,偏差参数为$b_i$。当对该隐藏层使用丢弃法时，该层的隐藏单元将有一定概率被丢弃掉。设丢弃概率为 $p$ ， 那么有 $p$ 的概率 $h_i$ 会被清零，有 1-$p$ 的概率 $h_i$ 会除以 1−$p$ 做拉伸。丢弃概率是丢弃法的超参数。具体来说，设随机变量$\\xi_i$为0和1的概率分别为 $p$ 和 1−$p$ 。使用丢弃法时我们计算新的隐藏单元 $h^′_i$\n",
        "$$\n",
        "h_i' = \\frac{\\xi_i}{1-p} h_i.\n",
        "$$\n",
        "由于$E(\\xi_i) = 1-p$,因此\n",
        "$$\n",
        "E(h_i') = \\frac{E(\\xi_i)}{1-p}h_i = h_i.\n",
        "$$\n",
        "\n",
        "即丢弃法不改变其输入的期望值。让我们对图3.3中的隐藏层使用丢弃法，一种可能的结果如图3.5所示，其中 $h_2$ 和 $h_5$ 被清零。这时输出值的计算不再依赖 $h_2$ 和 $h_5$ ，在反向传播时，与这两个隐藏单元相关的权重的梯度均为0。由于在训练中隐藏层神经元的丢弃是随机的，即 $h_1, \\ldots, h_5$ 都有可能被清零，输出层的计算无法过度依赖 $h_1, \\ldots, h_5$ 中的任一个，从而在训练模型时起到正则化的作用，并可以用来应对过拟合。在测试模型时，我们为了拿到更加确定性的结果，一般不使用丢弃法。\n",
        "<div align=center><img src=\"https://zh.gluon.ai/_images/dropout.svg\" width=\"300\"/></div>\n",
        "<center>图 3.5 隐藏层使用了丢弃法的多层感知机</center>"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Xmagn8L2VimD",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import tensorflow as tf\n",
        "from tensorflow import keras\n",
        "from tensorflow.keras import losses\n",
        "from tensorflow.keras import layers,Sequential\n",
        "import numpy as np\n",
        "import tensorflow.data as tfdata"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "h6QucNtbWKTt",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "tf.enable_eager_execution()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "77lYeHVTPYDM",
        "colab_type": "text"
      },
      "source": [
        "###3.13.2. 从零开始实现\n",
        "根据丢弃法的定义，我们可以很容易地实现它。下面的dropout函数将以drop_prob的概率丢弃输入X中的元素。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "_wfzmMIAWNTj",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def dropout(X,drop_prob):\n",
        "  assert 0<= drop_prob<=1\n",
        "  keep_prob=1-drop_prob\n",
        "  #这种情况下把全部元素丢弃\n",
        "  if keep_prob==0:\n",
        "    return tf.zeros_like(X)\n",
        "  # mask=tf.random.uniform(shape=(tf.shape(X)),minval=0,maxval=1).numpy()<keep_prob\n",
        "  mask=tf.less(tf.random.uniform(shape=(tf.shape(X)),minval=0,maxval=1),y=tf.constant(keep_prob,dtype=tf.float32))\n",
        "  return tf.cast(mask,dtype=tf.float32) * X/keep_prob"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IahnVLz8PiNr",
        "colab_type": "text"
      },
      "source": [
        "我们运行几个例子来测试一下dropout函数。其中丢弃概率分别为0、0.5和1。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Mueil4b6W-yT",
        "colab_type": "code",
        "outputId": "b5c9ccf2-4b26-4681-e5af-c7955e4bed16",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 125
        }
      },
      "source": [
        "X=tf.reshape(tf.range(16,dtype=tf.float32),shape=(2,8))\n",
        "print(X)\n",
        "dropout(X,0.0)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "tf.Tensor(\n",
            "[[ 0.  1.  2.  3.  4.  5.  6.  7.]\n",
            " [ 8.  9. 10. 11. 12. 13. 14. 15.]], shape=(2, 8), dtype=float32)\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<tf.Tensor: id=18, shape=(2, 8), dtype=float32, numpy=\n",
              "array([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.],\n",
              "       [ 8.,  9., 10., 11., 12., 13., 14., 15.]], dtype=float32)>"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 4
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "kpjEs4quXM1k",
        "colab_type": "code",
        "outputId": "d27ca452-d65b-4c9a-c641-f9bda346686e",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 71
        }
      },
      "source": [
        "dropout(X,0.5)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<tf.Tensor: id=31, shape=(2, 8), dtype=float32, numpy=\n",
              "array([[ 0.,  0.,  4.,  6.,  0.,  0., 12., 14.],\n",
              "       [16., 18., 20., 22., 24., 26.,  0., 30.]], dtype=float32)>"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 5
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "WBy3tNV2a75F",
        "colab_type": "code",
        "outputId": "4bfb7c75-e517-4ad2-e493-c1ada382a244",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 71
        }
      },
      "source": [
        "dropout(X,1)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<tf.Tensor: id=32, shape=(2, 8), dtype=float32, numpy=\n",
              "array([[0., 0., 0., 0., 0., 0., 0., 0.],\n",
              "       [0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 6
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "s6zgggfcPp2Y",
        "colab_type": "text"
      },
      "source": [
        "####3.13.2.1. 定义模型参数\n",
        "实验中，我们依然使用“softmax回归的从零开始实现”一节中介绍的Fashion-MNIST数据集。我们将定义一个包含两个隐藏层的多层感知机，其中两个隐藏层的输出个数都是256。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "hS8ndXebbB-c",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "num_inputs,num_outputs,num_hiddens1,num_hiddens2=784,10,256,256\n",
        "\n",
        "W1=tf.Variable(tf.random.normal(shape=(num_inputs,num_hiddens1),stddev=0.01))\n",
        "b1=tf.Variable(tf.zeros(num_hiddens1))\n",
        "\n",
        "W2=tf.Variable(tf.random.normal(shape=(num_hiddens1,num_hiddens2),stddev=0.01))\n",
        "b2=tf.Variable(tf.zeros(num_hiddens2))\n",
        "\n",
        "W3=tf.Variable(tf.random.normal(shape=(num_hiddens2,num_outputs),stddev=0.01))\n",
        "b3=tf.Variable(tf.zeros(num_outputs))\n",
        "\n",
        "params=[W1,b1,W2,b2,W3,b3]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "a5WpOmDXP1IN",
        "colab_type": "text"
      },
      "source": [
        "####3.13.2.2. 定义模型\n",
        "下面定义的模型将全连接层和激活函数ReLU串起来，并对每个激活函数的输出使用丢弃法。我们可以分别设置各个层的丢弃概率。通常的建议是把靠近输入层的丢弃概率设得小一点。在这个实验中，我们把第一个隐藏层的丢弃概率设为0.2，把第二个隐藏层的丢弃概率设为0.5。我们可以通过is_training参数来判断运行模式为训练还是测试，并只需在训练模式下使用丢弃法。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "BCSv8eFKe3jD",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "drop_prob1,drop_prob2=0.2,0.5\n",
        "def net(X,is_training=True):\n",
        "  # X=tf.reshape(X,shape=(-1,num_inputs))\n",
        "  H1=keras.activations.relu(tf.matmul(X,W1)+b1)\n",
        "  if is_training:\n",
        "    H1=dropout(H1,drop_prob1)\n",
        "  H2=keras.activations.relu(tf.matmul(H1,W2)+b2)\n",
        "  if is_training:\n",
        "    H2=dropout(H2,drop_prob2)\n",
        "  return tf.matmul(H2,W3)+b3"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Th2bMgLiQFdN",
        "colab_type": "text"
      },
      "source": [
        "####3.13.2.3. 训练和测试模型\n",
        "这部分与之前多层感知机的训练和测试类似。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "d9Kc5LRthJzA",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "batch_size=256\n",
        "buffer_size=10000\n",
        "def load_data_fashion_mnist(batch_size,buffer_size):\n",
        "  (x_train,y_train),(x_test,y_test)=keras.datasets.fashion_mnist.load_data()\n",
        "  x_train=np.reshape(x_train,[x_train.shape[0],-1])#将三维张量reshape成二维\n",
        "  x_test=np.reshape(x_test,[x_test.shape[0],-1])\n",
        "  train_iter=tfdata.Dataset.from_tensor_slices((x_train,y_train)).map(lambda x,y:(x/255,y)).shuffle(buffer_size).batch(batch_size)\n",
        "  test_iter=tfdata.Dataset.from_tensor_slices((x_test,y_test)).map(lambda x,y:(x/255,y)).batch(batch_size)\n",
        "  return train_iter,test_iter"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "aEzBFk_uhcTe",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def sgd(params,loss,t,lr,batch_size):\n",
        "  for param in params:\n",
        "    dl_dp=t.gradient(loss,param) #求梯度\n",
        "    param.assign_sub(lr*dl_dp/batch_size) #更新梯度\n",
        "\n",
        "def evaluate_accuracy(data_iter,net):\n",
        "  acc_sum,n=0.0,0\n",
        "  for X,y in data_iter:\n",
        "    # y=tf.cast(y,tf.float32)\n",
        "    acc_sum+=tf.equal(tf.cast(tf.argmax(net(X,is_training=False),axis=1),tf.float32),tf.cast(y,tf.float32)).numpy().sum()\n",
        "    n+=tf.shape(y)[0].numpy()\n",
        "  return acc_sum/n    \n",
        "def train_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,params=None,lr=None,trainer=None):\n",
        "  for epoch in range(num_epochs):\n",
        "    train_l_sum,train_acc_sum,n=0.0,0.0,0\n",
        "    for X,y in train_iter:\n",
        "      if trainer is None:\n",
        "        with tf.GradientTape(persistent=True) as t:\n",
        "          y_hat=net(X)\n",
        "          l=tf.reduce_sum(loss(y,y_hat))\n",
        "        sgd(params,l,t,lr,batch_size)\n",
        "      else:\n",
        "        y_hat=net(X)\n",
        "        l=tf.reduce_sum(loss(y_hat,y))\n",
        "        trainer.minimize(lambda:loss(net(X),y),global_step=tf.train.get_or_create_global_step())\n",
        "      train_l_sum+=l.numpy()\n",
        "      train_acc_sum+=tf.equal(tf.cast(tf.argmax(y_hat,axis=1),tf.float32),tf.cast(y,tf.float32)).numpy().sum()\n",
        "      n+=tf.shape(y)[0]\n",
        "    test_acc=evaluate_accuracy(test_iter,net)\n",
        "    print('epoch %d,loss %.4f,train acc %.3f,test acc %.3f'%(epoch+1,train_l_sum/n,train_acc_sum/n,test_acc))  "
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "NEe5na8ygq4P",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "num_epochs,lr,batch_size=10,0.5,256\n",
        "buffer_size=10000\n",
        "loss=losses.SparseCategoricalCrossentropy(from_logits=True)\n",
        "train_iter,test_iter=load_data_fashion_mnist(batch_size=batch_size,buffer_size=buffer_size)\n",
        "train_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,params=params,lr=lr)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qNAYRVP3QMy-",
        "colab_type": "text"
      },
      "source": [
        "###3.13.3. 简洁实现\n",
        "在tf.keras中，我们只需要在全连接层后添加Dropout层并指定丢弃概率。在训练模型时，Dropout层将以指定的丢弃概率随机丢弃上一层的输出元素；在测试模型时，Dropout层并不发挥作用。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "6L2sjZ7-iQdz",
        "colab_type": "code",
        "outputId": "6f5ed08d-d5a1-43da-c002-a0a8e6f2b8a5",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 575
        }
      },
      "source": [
        "train_iter,test_iter=load_data_fashion_mnist(batch_size=batch_size,buffer_size=buffer_size)\n",
        "net=Sequential()\n",
        "net.add(layers.Dense(256,activation='relu'))\n",
        "net.add(layers.Dropout(0.5))\n",
        "net.add(layers.Dense(256,activation='relu'))\n",
        "net.add(layers.Dropout(0.5))\n",
        "net.add(layers.Dense(10))\n",
        "net.compile(optimizer=keras.optimizers.SGD(learning_rate=0.05),loss=loss,metrics=['accuracy'])\n",
        "net.fit_generator(train_iter,validation_data=test_iter,epochs=num_epochs)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Epoch 1/10\n",
            "233/235 [============================>.] - ETA: 0s - loss: 1.1401 - acc: 0.4690Epoch 1/10\n",
            "235/235 [==============================] - 11s 48ms/step - loss: 1.1370 - acc: 0.4705 - val_loss: 0.6734 - val_acc: 0.7585\n",
            "Epoch 2/10\n",
            "234/235 [============================>.] - ETA: 0s - loss: 0.7376 - acc: 0.7230Epoch 1/10\n",
            "235/235 [==============================] - 11s 46ms/step - loss: 0.7373 - acc: 0.7231 - val_loss: 0.5757 - val_acc: 0.7936\n",
            "Epoch 3/10\n",
            "233/235 [============================>.] - ETA: 0s - loss: 0.6395 - acc: 0.7694Epoch 1/10\n",
            "235/235 [==============================] - 11s 47ms/step - loss: 0.6395 - acc: 0.7695 - val_loss: 0.5189 - val_acc: 0.8137\n",
            "Epoch 4/10\n",
            "233/235 [============================>.] - ETA: 0s - loss: 0.5895 - acc: 0.7874Epoch 1/10\n",
            "235/235 [==============================] - 11s 48ms/step - loss: 0.5891 - acc: 0.7875 - val_loss: 0.4930 - val_acc: 0.8201\n",
            "Epoch 5/10\n",
            "234/235 [============================>.] - ETA: 0s - loss: 0.5522 - acc: 0.8016Epoch 1/10\n",
            "235/235 [==============================] - 11s 46ms/step - loss: 0.5517 - acc: 0.8016 - val_loss: 0.4723 - val_acc: 0.8271\n",
            "Epoch 6/10\n",
            "233/235 [============================>.] - ETA: 0s - loss: 0.5261 - acc: 0.8167Epoch 1/10\n",
            "235/235 [==============================] - 11s 46ms/step - loss: 0.5260 - acc: 0.8167 - val_loss: 0.4573 - val_acc: 0.8323\n",
            "Epoch 7/10\n",
            "234/235 [============================>.] - ETA: 0s - loss: 0.5082 - acc: 0.8199Epoch 1/10\n",
            "235/235 [==============================] - 11s 46ms/step - loss: 0.5079 - acc: 0.8200 - val_loss: 0.4449 - val_acc: 0.8371\n",
            "Epoch 8/10\n",
            "233/235 [============================>.] - ETA: 0s - loss: 0.4915 - acc: 0.8258Epoch 1/10\n",
            "235/235 [==============================] - 11s 47ms/step - loss: 0.4913 - acc: 0.8258 - val_loss: 0.4356 - val_acc: 0.8389\n",
            "Epoch 9/10\n",
            "233/235 [============================>.] - ETA: 0s - loss: 0.4787 - acc: 0.8306Epoch 1/10\n",
            "235/235 [==============================] - 11s 47ms/step - loss: 0.4787 - acc: 0.8306 - val_loss: 0.4294 - val_acc: 0.8413\n",
            "Epoch 10/10\n",
            "234/235 [============================>.] - ETA: 0s - loss: 0.4681 - acc: 0.8345Epoch 1/10\n",
            "235/235 [==============================] - 11s 46ms/step - loss: 0.4680 - acc: 0.8345 - val_loss: 0.4223 - val_acc: 0.8466\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<tensorflow.python.keras.callbacks.History at 0x7f1c9d94ad30>"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 34
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-tXM4AcNfvD5",
        "colab_type": "text"
      },
      "source": [
        "###3.13.4. 小结\n",
        "* 我们可以通过使用丢弃法应对过拟合。\n",
        "* 丢弃法只在训练模型时使用。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RrIQr0EOQcy8",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        ""
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}